Moran's I in PySAL

A spatial autocorrelation tutorial using Swedish regional data

🧭 What is Moran's I?

Moran's I is the most widely used measure of spatial autocorrelation β€” it tells us whether similar values tend to cluster together in space, or whether they are randomly or negatively arranged.

$$I = \dfrac{N}{W} \cdot \dfrac{\displaystyle\sum_{i}\sum_{j} w_{ij}\,(x_i - \bar{x})(x_j - \bar{x})}{\displaystyle\sum_{i}(x_i - \bar{x})^2}$$
NNumber of spatial units (regions)
WSum of all spatial weights wα΅’β±Ό
wSpatial weight matrix β€” encodes "who is neighbor to whom"
xVariable of interest (e.g., unemployment rate)
Interpretation
I β‰ˆ +1 β†’ Strong positive autocorrelation (clusters)
I β‰ˆ 0  β†’ Spatial randomness
I β‰ˆ βˆ’1 β†’ Dispersed / checkerboard pattern
Expected value under randomness: E[I] = βˆ’1/(Nβˆ’1) β‰ˆ 0 for large N

πŸ‡ΈπŸ‡ͺ The Data: Swedish NUTS-2 Regions

Sweden is divided into 8 NUTS-2 regions. We'll use simulated unemployment rates (%) that mirror real north–south patterns, where the more urbanized south tends to have lower unemployment.

RegionCodeUnemployment %
High (>7%)
Medium
Low (<5%)

Hover over regions to explore

πŸ”— Step 1: Build the Spatial Weights Matrix

The weights matrix W defines which regions are neighbors. The simplest approach is queen contiguity (shared border or corner). In PySAL, we use libpysal:

import geopandas as gpd import libpysal from esda.moran import Moran import splot.esda as splot import matplotlib.pyplot as plt # Load Swedish NUTS-2 shapefile (or GeoPackage) sweden = gpd.read_file("sweden_nuts2.gpkg") # Create Queen contiguity weights w = libpysal.weights.Queen.from_dataframe(sweden) w.transform = 'r' # row-standardize: each row sums to 1 # Inspect print(w.n) # 8 regions print(w.mean_neighbors) # average neighbors per region
Row-standardization (transform='r') means each weight wα΅’β±Ό = 1/nα΅’ where nα΅’ is the number of neighbors of i. This makes the spatial lag a proper local average of neighbors.

Weights Matrix (Queen Contiguity)

πŸ“ Step 2: Compute Moran's I

# Compute Global Moran's I mi = Moran(sweden['unemployment'], w) print(f"Moran's I = {mi.I:.4f}") print(f"Expected I = {mi.EI:.4f}") print(f"p-value = {mi.p_sim:.4f}") # based on 999 permutations print(f"z-score = {mi.z_sim:.4f}")
+0.612
Moran's I
0.012
p-value (999 perms)
Interpretation: I = 0.612 is substantially above the expected value (β‰ˆ 0). The p-value of 0.012 means there is only a 1.2% chance of observing this level of clustering by random chance. Unemployment rates in Sweden are significantly clustered in space. High-unemployment regions border other high-unemployment regions (mainly in the north), and low-unemployment regions cluster in the south.

πŸ“Š Step 3: The Moran Scatter Plot

The Moran scatter plot shows each region's standardized value (z) on the X axis vs. its spatial lag (average of neighbors' z-scores) on the Y axis. The slope of the regression line is Moran's I.

# Plot Moran scatter plot using splot fig, ax = splot.esda.moran_scatterplot(mi, aspect_equal=True) plt.title("Moran Scatter Plot – Swedish Unemployment") plt.show()
HH (High-High)
LH
HL
LL (Low-Low)

Quadrant Meanings

Q1 – High-High (HH): Region has high unemployment AND its neighbors do too. Spatial cluster of disadvantage.
Q2 – Low-High (LH): Region is low but surrounded by high. Spatial outlier.
Q3 – Low-Low (LL): Region and its neighbors all have low unemployment. Cluster of prosperity.
Q4 – High-Low (HL): High region surrounded by low. Spatial outlier.

The slope of the red line = Moran's I β‰ˆ 0.612

πŸ” Step 4: Local Moran's I (LISA)

Global Moran's I gives one number for the whole country. Local Indicators of Spatial Association (LISA) decompose it to identify where clusters are.

from esda.moran import Moran_Local # Compute Local Moran's I lm = Moran_Local(sweden['unemployment'], w) # lm.q tells us the quadrant: 1=HH, 2=LH, 3=LL, 4=HL # lm.p_sim gives the pseudo p-value per region sweden['lisa_q'] = lm.q sweden['lisa_p'] = lm.p_sim sweden['sig'] = lm.p_sim < 0.05 # Plot significance map splot.esda.lisa_cluster(lm, sweden, p=0.05) plt.show()
Result for Sweden: Northern regions (Norra Mellansverige, Γ–vre Norrland) appear as HH clusters β€” significantly high unemployment surrounded by high unemployment. The Stockholm region appears as part of an LL cluster β€” low unemployment in a prosperous neighborhood.

πŸš€ Complete PySAL Workflow

import geopandas as gpd import libpysal from esda.moran import Moran, Moran_Local import splot.esda as splot import matplotlib.pyplot as plt import numpy as np # ── 1. Load data ───────────────────────────────────────────── sweden = gpd.read_file("sweden_nuts2.gpkg") y = sweden["unemployment"] # ── 2. Weights ──────────────────────────────────────────────── w = libpysal.weights.Queen.from_dataframe(sweden) w.transform = "r" # ── 3. Global Moran's I ─────────────────────────────────────── mi = Moran(y, w) print(f"I={mi.I:.3f}, p={mi.p_sim:.3f}") fig, axs = plt.subplots(1, 2, figsize=(12, 5)) # Choropleth map sweden.plot(column="unemployment", ax=axs[0], cmap="YlOrRd", legend=True) axs[0].set_title("Unemployment Rate (%)") # Moran scatter plot splot.esda.moran_scatterplot(mi, ax=axs[1]) axs[1].set_title(f"Moran Scatter (I={mi.I:.3f})") plt.tight_layout() plt.show() # ── 4. Local Moran's I (LISA) ──────────────────────────────── lm = Moran_Local(y, w, seed=42) fig2, ax2 = plt.subplots(figsize=(6, 8)) splot.esda.lisa_cluster(lm, sweden, p=0.05, ax=ax2) ax2.set_title("LISA Cluster Map – Sweden") plt.show()

βœ… Key Takeaways

1Spatial weights encode the neighborhood structure β€” always row-standardize for Moran's I
2Global Moran's I gives a single summary statistic for the whole study area
3Significance is assessed by permutation (999 random shuffles), not parametric tables
4Local Moran's I (LISA) identifies specific cluster locations and types (HH, LL, HL, LH)
Sweden's unemployment story: The north–south gradient in Swedish unemployment isn't random β€” it's spatially structured. Moran's I β‰ˆ 0.61 with p < 0.05 tells us this pattern is unlikely to have arisen by chance.
Libraries used: libpysal, esda, splot, geopandas, matplotlib. Install with: pip install pysal splot geopandas